This assignment is for ETC5521 Assignment 1 by Team Echidna comprising of Ruimin Lin and Rahul Bharadwaj.
Board Game has been a type of leisure that people have enjoyed from a very long time even before computers and video-games existed and has gone through enormous evolution ever since its inception. Board Games enables a way for people to socialize, reducing stress under such a fast-moving society, and paves way for an extensive brain exercise. Being a popular choice of leisure, what makes board games great? What is the reason for Board Games to have survived in a world of Virtual Reality games? In other words, what are the common characteristics of top ranked board games? What are the best board games in terms of average rating?
The original board games data used in this report is obtained from the Board Game Geek database, and is cleaned and shared by Thomas Mock.
The tidy dataset consists of 22 columns and 10532 rows, in which there are 22 variables and 10532 observations. It consists of data such as max/min playtime, max/min players, min age of players that can play, game designer, game publisher, mechanics of the game and a lot more. One thing to notice is that even though the data set is tidy, we still find observations in variables like category, family, mechanic to be messy and repetitive, which may limit our ability to explore these variables.
The aim of this exploratory analysis is to find out what factor affects the average rating of board games. This would give insights as to what board games are most popular and the characteristics these board games share. Therefore, we have articulated the following questions to help us with further exploration of the board games data.
Primary Question:
What are the common characteristics of top ranked board games?
Secondary Questions:
The variables included in the data are as follows:
## Rows: 10,532
## Columns: 22
## $ game_id <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 1...
## $ description <chr> "Die Macher is a game about seven sequential politic...
## $ image <chr> "//cf.geekdo-images.com/images/pic159509.jpg", "//cf...
## $ max_players <dbl> 5, 4, 4, 4, 6, 6, 2, 5, 4, 6, 7, 5, 4, 4, 6, 4, 2, 8...
## $ max_playtime <dbl> 240, 30, 60, 60, 90, 240, 20, 120, 90, 60, 45, 60, 1...
## $ min_age <dbl> 14, 12, 10, 12, 12, 12, 8, 12, 13, 10, 13, 12, 10, 1...
## $ min_players <dbl> 3, 3, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 3, 3, 2, 3, 2, 2...
## $ min_playtime <dbl> 240, 30, 30, 60, 90, 240, 20, 120, 90, 60, 45, 45, 6...
## $ name <chr> "Die Macher", "Dragonmaster", "Samurai", "Tal der Kö...
## $ playing_time <dbl> 240, 30, 60, 60, 90, 240, 20, 120, 90, 60, 45, 60, 1...
## $ thumbnail <chr> "//cf.geekdo-images.com/images/pic159509_t.jpg", "//...
## $ year_published <dbl> 1986, 1981, 1998, 1992, 1964, 1989, 1978, 1993, 1998...
## $ artist <chr> "Marcus Gschwendtner", "Bob Pepper", "Franz Vohwinke...
## $ category <chr> "Economic,Negotiation,Political", "Card Game,Fantasy...
## $ compilation <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "CAT...
## $ designer <chr> "Karl-Heinz Schmiel", "G. W. \"Jerry\" D'Arcey", "Re...
## $ expansion <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, "Elfengold,Elfen...
## $ family <chr> "Country: Germany,Valley Games Classic Line", "Anima...
## $ mechanic <chr> "Area Control / Area Influence,Auction/Bidding,Dice ...
## $ publisher <chr> "Hans im Glück Verlags-GmbH,Moskito Spiele,Valley Ga...
## $ average_rating <dbl> 7.66508, 6.60815, 7.44119, 6.60675, 7.35830, 6.52534...
## $ users_rated <dbl> 4498, 478, 12019, 314, 15195, 73, 2751, 186, 1263, 6...
The explanation of variables and variable types are provided to enable a better understanding of the variables in board games data set.
game_id: ID of a particular game, the game_id should be a character vector(categorical) instead of a double vector mentioned in the table above.
description: Game description, a character vector.
image: URL image of the game, a character vector.
max_players/min_player: maximum/minimum number of recommended players, double vectors.
max_playtime/min_playtime: maximum/minimum recommended playtime, double vectors.
min_age: recommended minimum player age, double vectors.
name: name of the game, a character vector.
playing_time: average playtime of a game, a double vector.
thumbnail: URL thumbnail of the game, a character vector.
year_published: year the game was published, a double vector.
artist: artist for game art, a character vector.
category: categories of the game, a character vector.
compilation: name of compilation, a character vector.
designer: game designer, a character vector.
expansion: name of expansion pack (if any), a character vector.
family: family of game - equivalent to a publisher, a character vector.
mechanic: how game is played, a character vector.
publisher: company/person who published the game, a character vector.
average_rating: average rating from 1 to 10 on the website(Board Games Geek), a double vector.
users_rated: number of users rated the game, a double vector.
To ensure the reliability of the board game ratings, the data is limited to games with at least 50 ratings and for games between 1950 and 2016. The site’s database has more than 90,000 games with crowd-sourced ratings.
The original board games data set consists of 90400 observations, and 80 variables. Therefore, data cleaning and wrangling is necessary to enable better analysis procedure. Thomas has replaced long and complicated variable names like details.description in original data to description using janitor::clean_names and set_names function, which avoids messy code writing. In addition, he has eliminated around 50 variables using the select function and that leaves 27 variables at this stage.
The data set is then filtered to board games published from 1950 to 2016, with at least 50 users rated. ‘NA’ values in variable year_published is also omitted. Thomas then excludes variables that may not be useful for the analysis, such as attributes_total, game_type etc., which ultimately, leaves us with a tidy data set (22 variables and 10532 variables) that is relatively concise and convenient for further exploration.
[FILL] Should include at least one plot or numerical summary for each of your questions, that helps the reader arrive at an answer. You should also write paragraphs describing the methods, summaries and findings.
| name | average_rating | max_playtime | min_playtime | max_players | min_players |
|---|---|---|---|---|---|
| Small World Designer Edition | 9.00392 | 80 | 40 | 6 | 2 |
| Kingdom Death: Monster | 8.93184 | 180 | 60 | 6 | 1 |
| Terra Mystica: Big Box | 8.84862 | 150 | 60 | 5 | 2 |
| Last Chance for Victory | 8.84603 | 60 | 60 | 2 | 2 |
| The Greatest Day: Sword, Juno, and Gold Beaches | 8.83081 | 6000 | 60 | 8 | 2 |
| Last Blitzkrieg | 8.80263 | 960 | 180 | 4 | 2 |
| Enemy Action: Ardennes | 8.75802 | 600 | 0 | 2 | 1 |
| Through the Ages: A New Story of Civilization | 8.74235 | 240 | 180 | 4 | 2 |
| 1817 | 8.70848 | 540 | 360 | 7 | 3 |
| Pandemic Legacy: Season 1 | 8.66878 | 60 | 60 | 4 | 2 |
The Greatest Day:Sword, Juno, and Gold Beaches with 6000 minutes max. playtime and an average rating of 8.8308
Axis Empires: Totaler Krieg! with 3600 minutes max. playtime and average rating of 8.4194
Beyond the Rhine with 3000 minutes max. playtime and average rating of 8.5979
It is difficult to examine the trend or common characteristics with these outliers presents, therefore, we have limited the maximum playtime to less than xx minutes using the IQR outliers formula. (Q1 - 1.5IQR and Q3 + 1.5 IQR)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 82.5 142.5 461.1 345.0 6000.0
## [1] -311.25
## [1] 738.75
Now we can have a clearer picture of where majority of top-50 ranked board games lie in the graph of average rating against maximum playtime. Where, majority of board games lie within the range of 200 minutes of maximum playtime.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 45.0 60.0 247.5 180.0 3600.0
## [1] -157.5
## [1] 382.5
We have implemented the same method to omit the outliers as done previously, the graph demonstrates that in top-50 ranked board games, most of them have a minimum playtime less than 100 minutes.
In the scatterplot for average rating against minimum players, we observed that most top 50 board games have at least 2 players.
In the scatterplot for average rating against maximum players, we observed that most top 50 board games have a maximum of 4 or 5 players.
In the scatterplot for average rating against minimum age of players, we observed that the minimum age set by majority of board games are between 10 - 15.
BoardGameGeek | Gaming Unplugged Since 2000. (2000). BGG. https://boardgamegeek.com/BoardGameGeek
Thomas Mock, (2019). Tidy Tuesday. https://github.com/rfordatascience/tidytuesday/tree/master/data/2019/2019-03-12